A Machine Learning Approach to Recovery of Scene Geometry from Images

نویسنده

Hoang Trinh

چکیده

Recovering the 3D structure of the scene from images yields useful information for tasks such as shape and scene recognition, object detection, or motion planning and object grasping in robotics. In this thesis, we introduce a general machine learning approach called unsupervised CRF learning based on maximizing the conditional likelihood. We describe the application of our machine learning approach to computer vision systems that recover the 3-D scene geometry from images. We focus on recovering 3D geometry from single images, stereo pairs and video sequences. Building these systems requires algorithms for doing inference as well as learning the parameters of conditional Markov random fields (MRF). Unlike previous work, our system is trained unsupervisedly without using ground-truth labeled data. We employ a slanted-plane stereo vision model in which we use a fixed over-segmentation to segment the left image into coherent regions called superpixels. We then assign a disparity plane for each superpixel. We formulate the problem of inferring plane parameters as an MRF labelling problem, which can be solved by an energy minimization method. The MRF is a graphical model in which superpixels define nodes and the adjacency between superpixels define edges. Our stereo energy function balances between a data matching term and a smoothness term. For systems with continuous valued variables, or discrete-valued variables with very large state space, it is impossible to directly use a standard discrete MRF inference techniques such as Loopy BP, graph cuts or tree-reweighted message passing. For such systems, we propose to use a generic Particle-based Belief Propagation (PBP) algorithm closely related to previous work, which we then formulate specifically for our MRF labeling problems. Although we only describe a specific use of this generic PBP algorithm, we believe it can be used as an approximate inference scheme for a wide variety of problems that can be formulated by a probabilistic graphical model, especially those containing many random variables with very large or continuous domains. We demonstrate the use of our unsupervised CRF learning algorithm for a parameterized slanted-plane stereo vision model involving shape from texture cues. This unsupervised learning algorithm implicitly trains shape from texture monocular surface orientation iii iv cues. We exhibit that training monocular cues from stereo pair data improves stereo depth estimation. Our stereo model with texture cues, only by unsupervised training, outperform the results in related work on the same stereo dataset. Our unsupervised learning method is also implemented for the monocular depth estimation (MDE) problem. The MDE model, learned using stereo pairs only, demonstrates a modest improvement after a few training steps, and achieve performance comparable to previous work on the same dataset. The use of MDE in combination with the dense stereo model also introduces a small boost in depth estimation over the initial stereo model. In this thesis, we also address the use of stereo video sequences. We formulate structure and motion estimation as an energy minimization problem, in which the model is an extension of our slanted-plane stereo vision model that also handles surface velocity. Surface estimation is done using our own slanted-plane stereo algorithm. Velocity estimation is achieved by solving an MRF labeling problem using Loopy BP. Performance analysis is done using our novel evaluation metrics based on the notion of view prediction error. Experiments on road-driving stereo sequences show encouraging results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Algorithm based on Deep Learning and Restricted Boltzmann Machine for Car Semantic Segmentation from Unmanned Aerial Vehicles (UAVs)-based Thermal Infrared Images

Nowadays, ground vehicle monitoring (GVM) is one of the areas of application in the intelligent traffic control system using image processing methods. In this context, the use of unmanned aerial vehicles based on thermal infrared (UAV-TIR) images is one of the optimal options for GVM due to the suitable spatial resolution, cost-effective and low volume of images. The methods that have been prop...

متن کامل

Application of the Extreme Learning Machine for Modeling the Bead Geometry in Gas Metal Arc Welding Process

Rapid prototyping (RP) methods are used for production easily and quickly of a scale model of a physical part or assembly. Gas metal arc welding (GMAW) is a widespread process used for rapid prototyping of metallic parts. In this process, in order to obtain a desired welding geometry, it is very important to predict the weld bead geometry based on the input process parameters, which are voltage...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Automatic Interpretation of UltraCam Imagery by Combination of Support Vector Machine and Knowledge-based Systems

With the development of digital sensors, an increasing number of high-resolution images are available. Interpretation of these images is not possible manually, which necessitates seeking for practical, fast and automatic solutions to solve the environmental and location-based management problems. The land cover classification using high-resolution imagery is a difficult process because of the c...

متن کامل

Non-melanoma skin cancer diagnosis with a convolutional neural network

Background: The most common types of non-melanoma skin cancer are basal cell carcinoma (BCC), and squamous cell carcinoma (SCC). AKIEC -Actinic keratoses (Solar keratoses) and intraepithelial carcinoma (Bowen’s disease)- are common non-invasive precursors of SCC, which may progress to invasive SCC, if left untreated. Due to the importance of early detection in cancer treatment, this study aimed...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1007.2958 شماره

صفحات -

تاریخ انتشار 2010

A Machine Learning Approach to Recovery of Scene Geometry from Images

نویسنده

چکیده

منابع مشابه

A Hybrid Algorithm based on Deep Learning and Restricted Boltzmann Machine for Car Semantic Segmentation from Unmanned Aerial Vehicles (UAVs)-based Thermal Infrared Images

Application of the Extreme Learning Machine for Modeling the Bead Geometry in Gas Metal Arc Welding Process

Image Classification via Sparse Representation and Subspace Alignment

Automatic Interpretation of UltraCam Imagery by Combination of Support Vector Machine and Knowledge-based Systems

Non-melanoma skin cancer diagnosis with a convolutional neural network

عنوان ژورنال:

اشتراک گذاری